Accent Issues in Large Vocabulary Continuous Speech Recognition

نویسندگان

  • Chao Huang
  • Tao Chen
  • Eric Chang
چکیده

Speech recognition has achieved great improvements recently. However, robustness is still one of the big problems, e.g. performance of recognition fluctuates sharply depending on the speaker, especially when the speaker has strong accent that is not covered in the training corpus. In this report, we first introduce our result on cross accent experiments and show a 30% error rate increase when accent independent models are used instead of accent dependent ones. Then we organize the report into three parts to cover the problem. In the first part, we do an investigation of speaker variability and manage to seek out the relationship between the well-known parameter representation and the physical characteristics of speaker, especially accent and confirm once more that accent is one of the main factors causing speaker variability. Then we provide our solutions for accent variability from two aspects. One is adaptation method, including pronunciation dictionary adaptation and acoustic model adaptation, which integrate the dominant changes among accent speaker groups and the detailed style for specific speaker in each group. The other is to build accent specific models as we do in cross accent experiments. The key point inside this method is to provide an automatic mechanism to choose the accent dependent model, which is explored in the fourth part of the report. We propose a fast and efficient GMM based accent identification method. The respective descriptions of three parts are outlined as follows. Analysis and modeling of speaker variability, such as gender, accent, age, speaking rate, and phone realizations, are important issues in speech recognition. It is known that existing feature representations describing speaker variations are high dimensional. In the third part of this report, we introduce two powerful multivariate statistical analysis methods, namely, principal component analysis (PCA) and independent component analysis (ICA), as tools to analyze such variability and extract low dimensional feature representation. Our findings are the following: (1) the first two principal components correspond to gender and accent, respectively. (2) It is shown that ICA based features yield better classification performance than PCA ones. Using 2-dimensional ICA representation, we achieve 6.1% and 13.3% error rate in gender and accent classification, respectively, for 980 speakers. In the fourth part, a method of accent modeling through Pronunciation Dictionary Adaptation (PDA) is presented. We derive the pronunciation variation between canonical speaker groups and accent groups and add an encoding of the differences to a canonical dictionary to create a new, adapted dictionary that reflects the accent characteristics. The pronunciation variation information is then integrated with acoustic and language models into a one-pass search framework. It is assumed that acoustic deviation and pronunciation variation are independent but complementary phenomena that cause poor performance among accented speakers. Therefore, MLLR, an efficient model adaptation technique, is also presented both alone and in combination with PDA. It is shown that when PDA, MLLR and the combination of them are used, error rate reductions of 13.9%, 24.1% and 28.4% respectively, are achieved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Speech Recognition of European Languages

A basic overview is presented of the main ongoing efforts in large vocabulary, continuous speech recognition (LVCSR) for European languages. We address issues in acoustic modeling, lexical representation, and language modeling for several European languages, as well as issues in comparative evaluation.

متن کامل

Acoustic and Lexical Modeling Techniques for Accented Speech Recognition

Speech interfaces are becoming pervasive among the common public with the prevalence of smart phones and cloud-based computing. This pushes Automatic Speech Recognition (ASR) systems to handle wide range of environments including different channels, noise conditions and speakers with varying accents. This thesis focuses on the impact of speakers’ accents on the ASR models and techniques to make...

متن کامل

Using accent-specific pronunciation modelling for improved large vocabulary continuous speech recognition

A method of modelling accent-specific pronunciation variations is presented. Speech from an unseen accent group is phonetically transcribed such that pronunciation variations may be derived. These context-dependent variations are clustered in decision trees which are used as a model of the pronunciation variation associated with this new accent group. The trees are then used to build a new pron...

متن کامل

Acoustic Modeling of Accented English Speech for Large-vocabulary Speech Recognition

In this paper, we present a study on robust speech recognition with respect to accent variations. Differences that characterize accents in speech can be divided into two parts: phonetic and acoustic. We focus on the acoustic differences and the ways of acoustic model design and training that can be used to minimize the effect of accent variations on the speech recognition system’s performance. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • I. J. Speech Technology

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2004